Taxonomy of Stock Market Indices 
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We investigate sets of financial non-redundant and nonsynchronously recorded time series. The sets 
are composed by a number of stock market indices located all over the world in five continents. By 
properly selecting the time horizon of returns and by using a reference currency we find a meaningful 
taxonomy. The detection of such a taxonomy proves that interpretable information can be stored 
in a set of nonsynchronously recorded time series. 
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One key aspect of information theory is that unpre- 
dictable time series, namely time series which are not or 
poorly redundant are characterized by statistical proper- 
ties which are almost indistinguishable from the ones ob- 
served in basic random processes such as, for examples, 
Bernoulli or Markov processes. Within this theoretical 
framework it may appear paradoxical that some time se- 
ries generated in complex systems, which are playing a 
vital role in biological and economic systems are essen- 
tially unpredictable and characterized by a negligible or 
pretty low redundancy. Prominent examples are the time 
series of the price changes of assets traded in a financial 
markets ^'^^ the symbolic series of coding regions 

of DNA 

In this letter we show that an approximately non- 
redundant time series non-synchronously recorded may 
carry different levels of interpretable information pro- 
vided that it can be analyzed synchronously together 
with other time series of the same kind. In other words 
we show that in addition to the information related to the 
redundant nature of the time series other sources of in- 
formation may be present in a non-redundant time series 
and that such additional information can be extracted 
by comparing the considered time series with analogous 
ones. Our work focuses on time series monitoring finan- 
cial markets located all over the world. With our study, 
we aim to detect in a quantitative way the existence of 
links between different stock markets. 

It is worth pointing out that the study of the dynam- 
ics of stock exchange indices located all over the world 
has additional levels of complexity with respect to, for 
example, the dynamics of a portfolio of stocks traded 
in a single stock market. To cite just two of the most 
prominent ones - (i) stock markets located all over the 
world have different opening and closing hours; and (ii) 
transactions in different markets are done by using dif- 
ferent currencies that fluctuates themselves the one with 
respect to the other. It is then important to quantify 
the degree of similarity between the dynamics of stock 
indices of nonsynchronous markets trading in different 
currencies. 

Here we present a study showing that meaningful in- 



formation can be extracted by a set of stock indices time 
series. In our study, the different levels of interdepen- 
dence and complexity of data are elucidated by consid- 
ering multiple applications of the same methodology on 
modified sets of the investigated time series. In our study 
we are able to show that it is possible to extract a group 
of taxonomies that directly reflects geographical and eco- 
nomic links between several countries all over the years. 
This is obtained by using the almost non-redundant time 
series of several stock indices of financial markets located 
all over the world only. 

The efficient market paradigm states that stock returns 
of financial price time series are unpredictable . Within 
this paradigm, time evolution of stock returns is well de- 
scribed by a random process . Several empirical anal- 
yses of real market data have proven that returns time 
series are approximately described by unpredictable non- 
redundant time series . The absence of redundancy 
is not complete in real markets and the presence of resid- 
ual redundancy has been detected [p|jlC|]. A minimized 
degree of redundancy is required to avoid the presence of 
arbitrage opportunities. 

We investigate two sets of data - (i) the nonsyn- 
chronous time evolution of n = 24 daily stock market 
indices computed in local currencies during the time pe- 
riod from January 1988 to December 1996, and (ii) the 
closure value of the 51 Morgan Stanley Capital Inter- 
national (MSCI) country indices daily computed in local 
currencies or in USA dollars in the time period from Jan- 
uary 1996 to December 1999. The stock indices used in 
our research belong to stock markets distributed all over 
the world in five continents. 

We already stated that a set of stock indices time series 
is essentially different from a portfolio of stocks traded in 
a single stock market. Specifically, the fact that trading 
may occur at different time in two different cities implies 
that some markets are open during the time whereas oth- 
ers are closed (the most prominent example concerns New 
York and Tokyo stock markets). This makes impossible 
a rigorously synchronous analysis of a large number of 
stock indices located all over the world. An analysis of 
daily data of say closure values may induce spurious cor- 
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relations introduced just by the specific time at which 
the variables are stored. The effects of nonsynchronous 
trading in time series analysis is well documented in the 
economic literature |ll|-|l3|. In fact different degrees of 
correlation between the New York and Tokyo markets 
are estimated depending if one consider the closure - clo- 
sure between the two markets or the closure - opening. 
In particular it has been empirically detected that the 
highest degree of correlation between these two markets 
is observed between the open-closure return of the New 
York stock exchange at day t and the opening-closure of 
the Tokyo stock market at day t + 1 jl^] . 

The aim of this study is to consider a large set of in- 
dices. It is of course impossible to collect a set of indices 
located all over the world which are synchronous with 
respect to the opening and closing hours. This intrinsic 
limitation motivate us to consider a week time horizon 
where the nonsynchronous hourly mismatch of our data 
is minimized. 

We aim to discover the presence of interpretable infor- 
mation in a set of time series. We proceed by determining 
a quasi-synchronous correlation coefficient of the weekly 
difference of logarithm of closure value of indices. The 
correlation coefficient is 

^(< Y^>-< Y, >2)(< Y^>-< Y, >2) 

where i and j are the numerical labels of indices, Yi — 
InS'i(f) — hiSi{t — 1) and Si{t) is the last value of the 
trading week t for the index i. The correlation coeffi- 
cient is computed between all the possible pairs of in- 
dices present in the database. The statistical average is 
a temporal average performed on all the trading days of 
the investigated time period. We then obtain the n x n 
matrix of correlation coefficient for weekly logarithm in- 
dex differences (which almost coincides with index re- 
turns). Correlation matrices have been recently inves- 
tigated within the framework of random matrix theory 
1^,^. Here we take a different perspective, we use the 
method introduced in ref. Specifically we assume 

that the subdominant ultrametric space associated to a 
metric distance may reveal part of the economic infor- 
mation stored in the time series. This is obtained by 
defining a quantitative distance between each pair of ele- 
ments i and j, d{i,j) = \/2{l — pij) and then using this 
distance matrix D to determine the minimum spanning 
tree (MST) connecting the n indices. The MST, a the- 
oretical concept of graph theory allows to obtain, 
in a direct and unique way, the subdominant ultrametric 
space and the hierarchical organization of the elements 
(indices in our case) of the investigated data set. Sub- 
dominant ultrametric space has been fruitfully used 
in the description of frustrated complex systems. The 
archetype of this kind of systems is a spin glass . 

In the rest of this letter, we show that the group of tax- 
onomies found by considering the subdominant ultramet- 
ric matrices D< associated with the distance matrices D, 



obtained from different sets of quasi-synchronously time 
series investigated in local currencies or in USA Dollars, 
are of direct interpretation. 

We first investigated the set of 24 indices of 20 different 
countries recorded during the period 1988-1996. We di- 
vide the entire period in 6 four years partially overlapping 
periods. The first covers the years 1988-1991, the second 
1989-1992 and so on. Each 4 years period comprises 207 
or 208 week records for each time series. In all the pe- 
riods we detect distinct clusters of North- America, Eu- 
rope and Asia-Pacific stock indices. The North- America 
cluster is rather stable over the years and includes the 
USA indices Dow Jones 30, Standard & Poor's 500, Nas- 
daq 100 and Nasdaq Composite. The European cluster 
increases in size starting in the first period as the one 
formed by Amsterdam AEX, Paris CAC40, Frankfurt 
DAX and London FTSE and ending as a FTSE, AEX, 
DAX, CAC40, Madrid General and Oslo General cluster 
in the last period. Milan Comit index stays always out 
of the European cluster in the investigated periods. This 
is not so surprising because Italy was the only large Eu- 
ropean economy rather far from the so-called Maastricht 
parameters during that period. The Asian-Pacific cluster 
is also expanding as time goes on. It starts as a Kuala 
Lumpur Comp., Singapore Straits Times Industrial and 
Bangkok SET cluster and ends as a Kuala Lumpur, Sin- 
gapore, Hong Kong Hang-Seng, Bangkok, Australia All 
Ordinary, Jakarta Comp. and Philippines Comp. clus- 
ter. Japanese stock indices do not join the Asian-Pacific 
cluster and Japan behaves as a poorly linked country. 
The same occurs for BSE30 index (of India) and South- 
America indices. 

In Fig. 1 we show the hierarchical trees obtained for 
the first and the last averaging time period. The pres- 
ence of clusters is observed in both periods but the tree 
of the second period has larger clusters. In summary 
our study shows that regional links between different 
economies emerge directly from time series. Moreover, 
an increase of the size of observed clusters and a relative 
stability of the clusters over the years is detected. 

With the aim of expanding this analysis over one of 
the largest sets of indices today available, we consider 
the set of 51 world indices computed by MSCI. For a so 
large set of indices the point of view of the investor be- 
comes crucial. In other words it is important to consider 
the problem also from the perspective of an international 
investor simultaneously monitoring the various markets. 
Several aspects of the different countries needed to be 
taken into account to make an appropriate comparison, 
they include the difference in currency values, levels of 
taxation etc.. Here we consider the most important of 
these differences namely the fact that the performances 
of different stock markets need to be compared by an in- 
ternational investor by using one reference currency. To 
evaluate the impact of a change of currency in the com- 
putation of indices, we consider the 51 MSCI country 
indices either in local currencies and in USA Dollars. 

The 51 indices belongs to 51 different countries located 
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in all continents. They comprises so-called emerged and 
emerging markets. Indices can be found at the web site 



http://www.mscidata.com. The data are daily data and 
covers the period 1996-1999. In Fig. 2a we show the 
result of our analysis performed by investigating weekly 
closure data in local currencies during the period 1996- 
1999. Four distinct clusters are detected (indicated in 
the bottom of the figure by a solid line). The cluster 
number one is essentially a North- American (green lines 
indicating USA and Canada indices) and European clus- 
ter (blue lines). There is only one country index from 
the Asia-Pacific area and it is Australia (red line). The 
cluster number two comprises 4 South- America country 
indices and the number three is composed by 6 Asia- 
Pacific country indices whereas the small cluster number 
four comprises India and Pakistan. The only world re- 
gion that does not explicitly show index clustering is the 
world region of Africa-Middle East (purple lines). How- 
ever, it is worth noting that several of these country in- 
dices are found at the extreme right of the hierarchical 
tree namely they are all quite far from any other country. 
Once again Japan index is disconnected from the Asia- 
Pacific cluster and is observed at the external edge of the 
South- America cluster. Between European countries the 
ones which are outside cluster one are The Czech Re- 
public, Greece, Turkey and Luxembourg. Of these four 
countries only the Luxembourg is considered by MSCI 
an emerged market. 

The same analysis is then repeated for the same in- 
dices in the same period but using indices computed in 
USA Dollars. The hierarchical tree of this investigation is 
shown in Fig, 2b. The overall structure observed in Fig. 
2a is conserved but some relevant changes are detected. 
For example the Australian index leaves cluster one and 
links together with New Zealand in cluster three of this 
figure. Japan moves still far being now the first read line 
after the Asian-Pacific cluster, the small India-Pakistan 
cluster disappears and Peru' links at the edge of cluster 
one. In summary the results of our analysis show that 
the computing of the indices in a single reference currency 
can modify the obtained hierarchical structure. However, 
the changes detected in the specific investigated period 
are not dramatic and limited to few countries. 

To verify if the nonsynchronous recording of daily data 
indeed affects our findings we also determine the hierar- 
chical tree for daily closure changes for the same set of 
indices used to obtain the tree of Fig. 2b. This new hi- 
erarchical tree shows the same overall structure observed 
in the tree of Fig. 2a but with a number of different 
links which are probably induced by the use of nonsyn- 
chronous time series. Specifically we observe that al- 
most all the American indices cluster together (Brazil, 
Argentina, Mexico, USA, Canada and Peru') and South- 
Africa cluster with the (in this case just) European clus- 
ter. 

In conclusion, we have shown that sets of stock in- 
dex time series located all over the world can be used 
to extract economic information about the links between 



different economies provided that the effects of the non- 
synchronous nature of the time series and of the different 
currencies used to compute the indices are properly taken 
into account. 
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FIG. 1. (a) Hierarchical tree obtained starting from the 
correlation coefficients pij of the set of 24 stock indices of 
weekly data. The correlation coefficient is obtained by av- 
eraging the index changes during the time period 1988 - 
1991. Each line refers to an index. Colors of the index 
country are coded in the following way: Americas (green), 
Asia-Pacific (red) and Europe (blue). The regional clus- 
ters of North- America (cluster number one, indices: USA 
DJIA, S&P 500, NASDAQ Comp., Nasdaq 100 and Cana- 
dian Toronto SE300) and Europe (cluster number two, in- 
dices: Amsterdam, Dax, CAC40 and FTSE) are detected. A 
third cluster is also present but it mixes indices of different 
world regions (Indices: TOPIX, Nikkei 225, Madrid General, 
Kuala Lumpur, Singapore, Bangkok SET and Milan Comit). 
Remaining indices are Hang Seng, Oslo, Australia All Ord., 
Philippines, Mexico IPC, Chile IGPA, Jakarta and Bombay 
BSE30 from left to right respectively, (b) As in a) but for 
the time period 1993-1996. The clusters detected in this pe- 
riod are larger and more homogeneous. Specifically, cluster 
one is a Japanese cluster (TOPIX and Nikkei 225 indices), 
cluster number two contains North- America indices as in a), 
cluster three is the Asia-Pacific cluster consisting of Kuala 
Lumpur, Singapore, Hang Seng, Bangkok SET, Australia All 
Ord., Jakarta and Philippines indices whereas cluster four is 
the European cluster formed by FTSE, Amsterdam, DAX, 
CAC40, Madrid General and Oslo. The remaining indices 
are Comit, Mexico IPC, Chile IGPA and BSE30 from left to 
right respectively. 
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FIG. 2. (a) Hierarchical tree obtained starting from the 
correlation coefficients pij of the set of 51 MSCI of weekly 
data computed in local currencies. The correlation coefficient 
is obtained by averaging the index changes during the time 
period 1996 - 1999. Each line refers to an index. Colors 
are coded as in Fig. 1 but in addition Africa-Middle East 
countries are also present and their color is purple. Four 
clusters are observed. Cluster number one contains France, 
Germany, Sweden, Netherlands, Switzerland, Spain, USA, 
Canada, Great Britain, Finland, Italy, Australia, Belgium, 
Norway, Austria, Denmaxk, Russia, Ireland, Portugal, Poland 
and Hungary indices. It is a North-American and Euro- 
pean cluster with the addition of Australia index. Cluster 
two is composed by Chile, Brazil, Argentina and Mexico in- 
dices they belong all to South- American countries. Externally 
linked to the previous two clusters we find Peru, Japan, The 
Czech Republic and New Zealand indices. Cluster three is an 
Asian-Pacific cluster where we find Singapore, Hong Kong, 
Philippines, Malaysia, Thailand and China indices. Just af- 
ter this cluster we have Indonesia, Greece, Venezuela, Turkey, 
Taiwan, Sri Lanka, Colombia, South- Africa and Luxembourg 
indices. The last cluster is the small cluster of India and 
Pakistan indices. The remaining country indices are Ko- 
rea, Egypt, Israel, Jordan and Morocco, (b) As in a) but 
for indices computed in USA Dollars. The country indices 
are from left to right: France, Germany, Sweden, Finland, 
Netherlands, USA, Canada, Spain, Great Britain, Norway, 
Italy, Poland, Hungary, Switzerland, Russia, Belgium, Aus- 
tria, Portugal, Denmark, Peru' (end of cluster one), Mex- 
ico, Argentina, Brazil, and Chile (end of cluster two), New 
Zealand and Australia (cluster three), Ireland, Venezuela, The 
Czech Republic, (start of cluster four) Singapore, Hong Kong, 
Philippines, Thailand, China and Indonesia (end of cluster 
four), Taiwan, Malaysia, Greece, Japan, Korea, South-Africa, 
Colombia, Turkey, Sri Lanka, India, Pakistan, Luxembourg, 
Egypt, Israel, Jordan and Morocco. 
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